Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cluster autoscaler #251

Merged
merged 2 commits into from
Apr 18, 2024
Merged

feat: cluster autoscaler #251

merged 2 commits into from
Apr 18, 2024

Conversation

cbzzz
Copy link
Contributor

@cbzzz cbzzz commented Apr 15, 2024

What type of PR is this?
/kind feature

What this PR does / why we need it:
Adds a new cluster-autoscaler flavor that provides an add-on to autoscale workload cluster nodes via Cluster Autoscaler.

Testing

  1. In addition to your workload cluster environment variables, set up the new autoscaling variables:
$ export CLUSTER_AUTOSCALER_VERSION=v1.29.0
# Optional: If specified, these values must be explicitly quoted!
$ export WORKER_MACHINE_MIN='"1"'
$ export WORKER_MACHINE_MAX='"10"'
  1. Create a cluster using the Cluster Autoscaler flavor:
$ clusterctl generate cluster ${CLUSTER_NAME} \
  --infrastructure linode:0.0.0 \
  --flavor cluster-autoscaler \
  | kubectl apply -f -
  1. When the Cluster is Ready, download the kubeconfig file and deploy any workload.
$ kubectl get secret ${CLUSTER_NAME}-kubeconfig -o jsonpath='{.data.value}' \
  | base64 -d - > /tmp/${CLUSTER_NAME}-kubeconfig

# Example workload
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 0
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          requests:
            memory: 1Gi
            cpu: 1000m
EOF
  1. Scale the workload beyond the workload cluster's capacity to trigger a scale up event.
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl scale deployment/nginx --replicas 2
  1. The Cluster Autoscaler should scale up any detected MachineSets, MachineDeployments, or MachinePools to meet scheduling requirements.
# Autoscaler logs on management cluster
$ kubectl logs deploy/${CLUSTER_NAME}-cluster-autoscaler
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.581700       1 clusterapi_provider.go:68] discovered node group: MachineDeployment/default/cbzzz-capl-md-0 (min: 1, max: 10, replicas: 1)
...
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.582547       1 orchestrator.go:185] Estimated 1 nodes needed in MachineDeployment/default/cbzzz-capl-md-0
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.582577       1 orchestrator.go:291] Final scale-up plan: [{MachineDeployment/default/cbzzz-capl-md-0 1->2 (max: 10)}]
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:10:25.582628       1 executor.go:147] Scale-up: setting group MachineDeployment/default/cbzzz-capl-md-0 size to 2

# Node events on workload cluster
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl events --for node/
6m5s                    Warning   FailedScheduling          Pod/nginx-5b656d96b5-vhwxc            0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
5m53s                   Normal    TriggeredScaleUp          Pod/nginx-5b656d96b5-vhwxc            pod triggered scale-up: [{MachineDeployment/default/cbzzz-capl-md-0 1->2 (max: 10)}]
  1. Delete the workload:
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl delete deployment/nginx
  1. The Cluster Autoscaler should scale down any associated MachineSets, MachineDeployments, or MachinePools. You may need to restart the Cluster Autoscaler with the --scale-down-unneeded-time=1s setting for a quicker reaction time.
# Autoscaler logs on management cluster
$ kubectl logs deploy/${CLUSTER_NAME}-cluster-autoscaler
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.258406       1 clusterapi_provider.go:68] discovered node group: MachineDeployment/default/cbzzz-capl-md-0 (min: 1, max: 10, replicas: 2)
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.258753       1 clusterapi_controller.go:714] node "cbzzz-capl-md-0-2skxq-tgrpd" is in nodegroup "MachineDeployment/default/cbzzz-capl-md-0"
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259341       1 clusterapi_controller.go:714] node "cbzzz-capl-md-0-2skxq-dtld8" is in nodegroup "MachineDeployment/default/cbzzz-capl-md-0"
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259642       1 klogx.go:87] Node cbzzz-capl-md-0-2skxq-dtld8 - cpu requested is 5% of allocatable
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259665       1 eligibility.go:104] Scale-down calculation: ignoring 1 nodes unremovable in the last 5m0s
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259683       1 cluster.go:156] Simulating node cbzzz-capl-md-0-2skxq-dtld8 removal
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259719       1 cluster.go:174] node cbzzz-capl-md-0-2skxq-dtld8 may be removed
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.259733       1 nodes.go:84] cbzzz-capl-md-0-2skxq-dtld8 is unneeded since 2024-04-17 14:19:35.535959819 +0000 UTC m=+623.701969197 duration 22.916044912s
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:19:59.260165       1 static_autoscaler.go:617] Scale down status: lastScaleUpTime=2024-04-17 14:10:24.375866728 +0000 UTC m=+72.541876099 lastScaleDownDeleteTime=2024-04-17 13:09:28.327657656 +0000 UTC m=-3583.506332776 lastScaleDownFailTime=2024-04-17 13:09:28.327657656 +0000 UTC m=-3583.506332776 scaleDownForbidden=false scaleDownInCooldown=true
...
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:34.837762       1 clusterapi_provider.go:68] discovered node group: MachineDeployment/default/cbzzz-capl-md-0 (min: 1, max: 10, replicas: 2)
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:38.871307       1 drain.go:131] All pods removed from cbzzz-capl-md-0-2skxq-dtld8
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:38.876928       1 clusterapi_controller.go:714] node "cbzzz-capl-md-0-2skxq-dtld8" is in nodegroup "MachineDeployment/default/cbzzz-capl-md-0"
default/cbzzz-capl-cluster-autoscaler-7dfd67c69d-dbhwh[cluster-autoscaler]: I0417 14:20:38.920265       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"7a7668d7-646d-449c-b5a7-49dcc4f96aac", APIVersion:"v1", ResourceVersion:"6415", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: empty node cbzzz-capl-md-0-2skxq-dtld8 removed

# Node events on workload cluster
$ KUBECONFIG=/tmp/${CLUSTER_NAME}-kubeconfig kubectl events --for node/
6m10s               Normal    ScaleDown                 Node/cbzzz-capl-md-0-2skxq-dtld8      marked the node as toBeDeleted/unschedulable
6m1s                Normal    RemovingNode              Node/cbzzz-capl-md-0-2skxq-dtld8      Node cbzzz-capl-md-0-2skxq-dtld8 event: Removing Node cbzzz-capl-md-0-2skxq-dtld8 from Controller

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  • Due to constraints with the Kubernetes RBAC system (i.e. roles cannot be subdivided beyond namespace-granularity), the Cluster Autoscaler add-on is deployed on the management cluster to prevent leaking Cluster API data between workload clusters.

  • Currently, the Cluster Autoscaler reuses the ${CLUSTER_NAME}-kubeconfig Secret generated by the bootstrap provider to interact with the workload cluster. The kubeconfig contents must be stored in a key named value. Due to this, all Cluster Autoscaler actions in the workload cluster are performed as the cluster-admin role (and might be insecure idk 🙈).

See: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

Copy link

codecov bot commented Apr 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.83%. Comparing base (1b0a785) to head (0f13cea).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #251   +/-   ##
=======================================
  Coverage   53.83%   53.83%           
=======================================
  Files          27       27           
  Lines        1566     1566           
=======================================
  Hits          843      843           
  Misses        673      673           
  Partials       50       50           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AshleyDumaine AshleyDumaine self-requested a review April 15, 2024 14:47
@AshleyDumaine AshleyDumaine added the feature New feature or request label Apr 15, 2024
1. Set up autoscaling environment variables
```sh
export CLUSTER_AUTOSCALER_VERSION=v1.29.0
export WORKER_MACHINE_MIN="\"1\""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we move this string escaping into the template instead of the variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think escaping should be needed, it's already quoted in the template

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can give this another shot, but envsubst didn't play well with quoting numbers as strings in the templating. It was either stripping them too aggressively or not enough.

Copy link
Contributor Author

@cbzzz cbzzz Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here seems to be how clusterctl generate generates templates in conjunction with envsubst. For what I can see, it:

  1. Renders the templates and then validates the Kubernetes YAML
  2. Passes the generated YAML through envsubst

envsubst substitutes both ${var} and "${var}" so you need to explicitly specify the shell variable as a "string".

I've slightly modified the documentation commands to remove the escapes and also provided defaults for these values.

@cbzzz cbzzz marked this pull request as ready for review April 17, 2024 16:51
@cbzzz cbzzz force-pushed the feat.autoscaling branch 2 times, most recently from 8e75985 to 1c70a57 Compare April 17, 2024 18:16
Copy link
Member

@AshleyDumaine AshleyDumaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I can't figure out a better workaround at the time for the scaling annotations 🤔

Adds a new cluster-autoscaler flavor that provides an autoscaling add-on for
workload cluster nodes via [Cluster Autoscaler](https://www.github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#cluster-autoscaler).

Due to constraints with the Kubernetes RBAC system (i.e. [roles cannot be
subdivided beyond
namespace-granularity](https://www.github.com/kubernetes/kubernetes/issues/56582)),
the Cluster Autoscaler add-on is deployed on the management cluster to prevent
leaking Cluster API data between workload clusters.

Currently, the Cluster Autoscaler reuses the `${CLUSTER_NAME}-kubeconfig` Secret
generated by the bootstrap provider to interact with the workload cluster. The
kubeconfig contents must be stored in a key named `value`. Due to this, all
Cluster Autoscaler actions in the workload cluster are performed as the
`cluster-admin` role.

See: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster
@cbzzz cbzzz merged commit e9dc9fa into main Apr 18, 2024
9 checks passed
@cbzzz cbzzz deleted the feat.autoscaling branch April 18, 2024 14:22
AshleyDumaine pushed a commit that referenced this pull request Apr 19, 2024
* feat: add cluster autoscaler flavor

Adds a new cluster-autoscaler flavor that provides an autoscaling add-on for
workload cluster nodes via [Cluster Autoscaler](https://www.github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#cluster-autoscaler).

Due to constraints with the Kubernetes RBAC system (i.e. [roles cannot be
subdivided beyond
namespace-granularity](https://www.github.com/kubernetes/kubernetes/issues/56582)),
the Cluster Autoscaler add-on is deployed on the management cluster to prevent
leaking Cluster API data between workload clusters.

Currently, the Cluster Autoscaler reuses the `${CLUSTER_NAME}-kubeconfig` Secret
generated by the bootstrap provider to interact with the workload cluster. The
kubeconfig contents must be stored in a key named `value`. Due to this, all
Cluster Autoscaler actions in the workload cluster are performed as the
`cluster-admin` role.

See: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster

* docs: add cluster autoscaler flavor
amold1 pushed a commit that referenced this pull request May 17, 2024
* feat: add cluster autoscaler flavor

Adds a new cluster-autoscaler flavor that provides an autoscaling add-on for
workload cluster nodes via [Cluster Autoscaler](https://www.github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#cluster-autoscaler).

Due to constraints with the Kubernetes RBAC system (i.e. [roles cannot be
subdivided beyond
namespace-granularity](https://www.github.com/kubernetes/kubernetes/issues/56582)),
the Cluster Autoscaler add-on is deployed on the management cluster to prevent
leaking Cluster API data between workload clusters.

Currently, the Cluster Autoscaler reuses the `${CLUSTER_NAME}-kubeconfig` Secret
generated by the bootstrap provider to interact with the workload cluster. The
kubeconfig contents must be stored in a key named `value`. Due to this, all
Cluster Autoscaler actions in the workload cluster are performed as the
`cluster-admin` role.

See: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#autoscaler-running-in-management-cluster-using-service-account-credentials-with-separate-workload-cluster

* docs: add cluster autoscaler flavor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants